-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback to bootstrapping #3480
base: master
Are you sure you want to change the base?
Conversation
The lifecycle of an avalanchego node can be separated to three main stages: Stage I - If supported by the VM, then a state sync occurs. Otherwise, this stage is skipped. In the X-chain, this stage is entirely substituted by a custom bootstrapping phase that replicates and executes the X-chain DAG. Stage II - A stage called bootstrapping, which entails replicating the blocks from other avalanchego nodes, and executing them. Stage III - The final form of an avalanchego node, in which it runs the snowman consensus protocol. The phases are implemented by components called "engines": - avalanche bootstrapping - snowman - state syncer - snowman bootstrapper And the handler in snow/networking/handler/ is responsible to route messages from the network into the correct engine. Engines all implement the same common.Engine interface, but the interface consists of the union of all operations across all engines. Indeed, it is often the case that a message of type `m` dispatched by engine `e`, cannot be processed by a different engine `e'. For instance, a Chits message cannot be processed by any engine other than consensus, and a message about a query of state summary is only relevant to the state sync engine. To that end, each engine simply implements a no-op dispatcher for messages it does not care about. The biggest problem with the existing aforementioned structure, is that the lifecycle of the engines imposes a strict one way movement across the various stages, and there is no single component which consolidates the transition among the stages. The movement between the various stages takes place by a callback given to each engine at every stage but the final one (Stage I and Stage II). The structure therefore makes it difficult to introduce movement from snowman consensus back to bootstrapping / state sync, or to have better control over the message dispatch. This commit unifies all engines into a single one under snow/engine/unified As a result, the implementation of the handler in snow/networking/handler/handler.go is now simpler, as it only interacts with the unified engine, and it never needs to query the snow.EngineState. The state transition between the various stages is now taken care of by the unified engine, and since the code to dispatch messages to the right engine is now all in the unified engine, it's not only more testable but it lets us move among the stages in the same place where we consider the stage we're in to dispatch the message to the correct engine. Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Co-authored-by: Darioush Jalali <[email protected]>
This commit adds a mechanism that detects that our node is behind the majority of the stake. The intent is to later have this mechanism be the trigger for the bootstrapping mechanism. Currently, the bootstrapping mechanism is only active upon node boot, but not at a later point. The mechanism works in the following manner: - It intercepts the snowman engine's Chits message handling, and upon every reception of the Chits message, the mechanism that detects if the node is a straggler (a node with a ledger height behind the rest) may be invoked, if it wasn't invoked too recently. - The mechanism draws statistics from the validators known to it, and computes the latest accepted block for each validator. - The mechanism then proceeds to determine which blocks are pending to be processed (a block pending to be processed was not accepted). - The mechanism then collects a snapshot of all blocks it hasn't accepted yet, and the amount of stake that has accepted this block. - The mechanism then waits for its next invocation, in order to see if it has accepted blocks correlated with enough stake. - If there is too much stake that has accepted blocks by other nodes correlated to it that the node hasn't accepted, then the mechanism announces the node is behind, and returns the time period between the two invocations. - The mechanism sums the total time it has detected the node is behind, until a sampling concludes it is not behind, and then the total time is nullified. Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
|
Yeah I need to tidy up the commit chain. I didn't mean to post this for review, I forgot to put it in draft mode.
I can split the PRs to pieces, that's not an issue. I just got the PR you mention working two days ago, so I didn't think it would be reviewed by anyone anytime soon :-) |
3eb6fb1
to
c906bc3
Compare
Signed-off-by: Yacov Manevich <[email protected]>
c906bc3
to
9d8a7b1
Compare
This PR has become stale because it has been open for 30 days with no activity. Adding the |
Why this should be merged
How this works
How this was tested